The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
神经消息传递是用于图形结构数据的基本功能提取单元,它考虑了相邻节点特征在网络传播中从一层到另一层的影响。我们通过相互作用的粒子系统与具有吸引力和排斥力的相互作用粒子系统以及在相变建模中产生的艾伦 - 卡恩力进行建模。该系统是一个反应扩散过程,可以将颗粒分离为不同的簇。这会导致图形神经网络的艾伦 - 卡恩消息传递(ACMP),其中解决方案的数值迭代构成了消息传播。 ACMP背后的机制是颗粒的相变,该颗粒能够形成多群集,从而实现GNNS预测进行节点分类。 ACMP可以将网络深度推向数百个层,理论上证明了严格的dirichlet能量下限。因此,它提供了GNN的深层模型,该模型避免了GNN过度厚度的常见问题。具有高均匀难度的各种实际节点分类数据集的实验表明,具有ACMP的GNN可以实现最先进的性能,而不会衰减Dirichlet Energy。
translated by 谷歌翻译
许多实际关系系统,如社交网络和生物系统,包含动态相互作用。在学习动态图形表示时,必须采用连续的时间信息和几何结构。主流工作通过消息传递网络(例如,GCN,GAT)实现拓扑嵌入。另一方面,时间演进通常通过在栅极机构中具有方便信息过滤的存储单元(例如,LSTM或GU)来表达。但是,由于过度复杂的编码,这种设计可以防止大规模的输入序列。这项工作从自我关注的哲学中学习,并提出了一种高效的基于频谱的神经单元,采用信息的远程时间交互。发达的频谱窗口单元(SWINIT)模型预测了具有保证效率的可扩展动态图形。该架构与一些构成随机SVD,MLP和图形帧卷积的一些简单的有效计算块组装。 SVD加MLP模块编码动态图事件的长期特征演进。帧卷积中的快速帧图形变换嵌入了结构动态。两种策略都提高了模型对可扩展分析的能力。特别地,迭代的SVD近似度将注意力的计算复杂性缩小到具有n个边缘和D边缘特征的动态图形的关注的计算复杂性,并且帧卷积的多尺度变换允许在网络训练中具有足够的可扩展性。我们的Swinit在各种在线连续时间动态图表学习任务中实现了最先进的性能,而与基线方法相比,可学习参数的数量可达七倍。
translated by 谷歌翻译
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. Previous studies employ multiple instance learning (MIL) to represent WSIs as bags of sampled patches because, for most occasions, only slide-level labels are available, and only a tiny region of the WSI is disease-positive area. However, WSI representation learning still remains an open problem due to: (1) patch sampling on a higher resolution may be incapable of depicting microenvironment information such as the relative position between the tumor cells and surrounding tissues, while patches at lower resolution lose the fine-grained detail; (2) extracting patches from giant WSI results in large bag size, which tremendously increases the computational cost. To solve the problems, this paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Precisely, we randomly extract instant-level patch features from WSIs with different magnification. Then a co-attention mapping between imaging and genomics is learned to uncover the pairwise interaction and reduce the space complexity of imaging features. Such early fusion makes it computationally feasible to use MIL Transformer for the survival prediction task. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability. We evaluate our approach on five cancer types from the Cancer Genome Atlas database and achieved an average c-index of $0.673$, outperforming the state-of-the-art multimodality methods.
translated by 谷歌翻译
Prior work on ideology prediction has largely focused on single modalities, i.e., text or images. In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content. We first collect five new large-scale datasets with English documents and images along with their ideological leanings, covering news articles from a wide range of US mainstream media and social media posts from Reddit and Twitter. We conduct in-depth analyses of news articles and reveal differences in image content and usage across the political spectrum. Furthermore, we perform extensive experiments and ablation studies, demonstrating the effectiveness of targeted pretraining objectives on different model components. Our best-performing model, a late-fusion architecture pretrained with a triplet objective over multimodal content, outperforms the state-of-the-art text-only model by almost 4% and a strong multimodal baseline with no pretraining by over 3%.
translated by 谷歌翻译
在多输入多输出(MIMO)系统中使用深度自动码器(DAE)进行端到端通信,是一种具有重要潜力的新概念。在误码率(BER)方面,已示出DAE-ADED MIMO以占地识别的奇异值分解(SVD)为基础的预编码MIMO。本文提出将信道矩阵的左右奇异矢量嵌入到DAE编码器和解码器中,以进一步提高MIMO空间复用的性能。 SVD嵌入式DAE主要优于BER的理论线性预编码。这是显着的,因为它表明所提出的DAES通过将通信系统视为单个端到端优化块来超出当前系统设计的极限。基于仿真结果,在SNR = 10dB,所提出的SVD嵌入式设计可以实现近10美元,并将BER减少至少10次,而没有SVD,相比增长了18倍的增长率最高18倍具有理论线性预编码。我们将这一点归因于所提出的DAE可以将输入和输出与具有有限字母输入的自适应调制结构匹配。我们还观察到添加到DAE的剩余连接进一步提高了性能。
translated by 谷歌翻译
临床问题应答(QA)旨在根据临床文本自动回答医疗专业人员的问题。研究表明,在一个语料库上培训的神经QA模型可能对来自不同研究所或不同患者组的新临床文本概括,其中大规模的QA对不容易获得模型再培训。为了解决这一挑战,我们提出了一个简单但有效的框架CliniQG4QA,它利用问题生成(QG)在新的临床环境中综合QA对,并在不需要手动注释的情况下提升QA模型。为了生成对训练QA模型至关重要的不同类型的问题,我们进一步引入了基于SEQ2SEQ的问题短语预测(QPP)模块,可以与大多数现有的QG模型一起使用以使生成多样化。我们的综合实验结果表明,我们的框架产生的QA​​语料库可以改善新上下文的QA模型(在完全匹配方面最高8%的绝对增益),QPP模块在实现增益方面发挥着至关重要的作用。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译